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Abstract 

A  stochastic  program  SP  with  solution  value  z*  can  be  approximately  solved  by  sampling  n  realizations  of  the  program’s 
stochastic  parameters,  and  by  solving  the  resulting  “approximating  problem”  for  We  show  that,  in  expectation,  z^ 

is  a  lower  bound  on  z*  and  that  this  bound  monotonically  improves  as  n  increases.  The  first  result  is  used  to  construct 
confidence  intervals  on  the  optimality  gap  for  any  candidate  solution  x  to  SP,  e.g.,  x  =  x'^.  A  sampling  procedure  based  on 
common  random  numbers  ensures  nonnegative  gap  estimates  and  provides  significant  variance  reduction  over  naive  sampling 
on  four  test  problems,  (c)  1999  Elsevier  Science  B.V.  All  rights  reserved. 

Keywords:  Programming;  Stochastic;  Monte  Carlo  approximations;  Statistics;  Sampling;  Confidence  intervals;  Variance 
reduction 


1.  Introduction 

This  paper  develops,  analyzes  and  computationally 
tests  a  new,  probabilistic  lower  boimd  for  stochastic 
programs.  The  bound  is  used  to  determine  the  qual¬ 
ity  of  candidate  solutions.  Specifically,  we  construct 
confidence  intervals  on  the  corresponding  “optimality 
gap”  which  is  the  dilference  in  objective  values  be¬ 
tween  a  candidate  solution  and  an  optimal  solution. 
The  bounding  methodology  is  based  on  solving  an 
approximating  problem  generated  by  a  Monte  Carlo 
sampling  of  the  random  parameters. 
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We  consider  a  stochastic  optimization  problem  of 
the  form 

SP  z*=m\viEf{x,^)  with  j:*  G  argmin  £'/(x,  I), 
■^62^  xev 

where  /  is  a  real-valued  function,  x  is  a  vector  of 
decision  variables  with  deterministic  feasible  set  X, 
and  I  is  a  vector  of  random  variables.  We  are  also 
concerned  with  an  associated  approximating  problem 

1  " 

SP„  z*  =  min  -  5^  / ( j:,  )  with 

i=i 

1  " 

xl  G  argmin  -  V/(x,|'), 
xev  n  ^ 
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where  /=  1, are  independent  and  identically 
distributed  (i.i.d.)  from  the  distribution  of  Through¬ 
out  this  paper  we  assume  that  the  first  and  second  mo¬ 
ments  of  f{x,  exist  for  all  x  €  X. 

Unless  the  random  vector  |  has  a  small  number 
of  possible  realizations  (also  called  “scenarios”),  it  is 
usually  impossible  to  solve  SP  exactly.  One  standard 
approach  for  approximately  solving  SP  is  to  use  a 
Monte  Carlo  sampling  procedure  to  generate  n  obser¬ 
vations  i  =  and  then  solve  (a  realization 

of)  the  approximating  problem.  This  paper  is  primar¬ 
ily  concerned  with  using  approximating  problems  to 
determine  the  quality  of  a  candidate  solution  jc  to  SP. 
We  discuss  various  methods  for  generating  x  below. 

An  important  special  case  of  SP,  and  the  one 
upon  which  our  computational  results  are  based,  is 
the  two-stage  stochastic  LP  (linear  program)  with 
recourse  [3,  36]  in  which 

f(x,  i)  =  cx  +  min  gy 

jSsO 

s.t.  Dy  —  Bx  +d,  (1) 

and  X  is  a  polyhedral  set.  Here,  |  =  vec(d,g,B,D), 
where  the  “vec”  operator  reads  its  arguments  column¬ 
wise  to  form  a  single  vector.  In  practice,  only  a  subset 
of  the  parameters  in  ^  are  random. 

The  method  of  approximately  solving  SP  by  solv¬ 
ing  a  realization  of  SP„  relies  on  “external  sampling”, 
i.e.,  the  sampling  is  performed  external  to  (prior  to) 
the  solution  procedure.  This  approach  is  justified  by 
the  theory  of  epi-convergence:  Under  certain  assump¬ 
tions,  converges  to  z*,  w.p.l.,  and  accumu¬ 
lation  points  of  are  optimal  solutions  to  SP, 

w.p.l.  See  [9,  21,  22,  33]  for  these  results  and  addi¬ 
tional  results  concerning  rates  of  convergence. 

Other  Monte  Carlo  solution  procedures  for  solving 
SP  use  “internal  sampling”.  These  procedures  include 
sampling-based  cutting-plane  methods  [4,  16,  19]  and 
stochastic  quasi-gradient  algorithms  [10,  11],  both  of 
which  are  adaptations  of  deterministically  valid  al¬ 
gorithms  in  which  exact  function  and  (sub-)gradient 
evaluations  are  replaced  with  Monte  Carlo  estimates. 
Sampling  is  “internal”  because  observations  are  gen¬ 
erated  as  the  algorithm  proceeds.  Desirable  asymptotic 
properties  can  be  obtained  for  both  internal  and  exter¬ 
nal  sampling  methodologies  but,  of  course,  candidate 
solutions  X  are  generated  using  a  finite  number  of  it¬ 
erations  and/or  observations.  In  either  methodology, 


it  is  dilficult  to  ascertain  the  quality  of  such  finitely 
generated  candidate  solutions. 

For  the  computational  results  in  this  paper,  we  gen¬ 
erate  a  candidate  solution  x  by  solving  approximating 
problems  of  the  form  SP„.  However,  our  methods  can 
also  be  applied  to  test  the  quality  of  a  candidate  so¬ 
lution  generated  by  any  means  such  as  the  algorithms 
cited  above  or  heuristic  procedures. 

One  obvious  approach  to  testing  solution  quality  is 
to  bound  the  optimality  gap,  defined  as  E f{x,  |)  —  z* ; 
this  is  the  approach  we  take.  For  two-stage  problems,  it 
is  straightforward  to  estimate  Ef{x,^)  using  standard 
statistical  procedures.  Thus,  the  optimality  gap  is  easy 
to  bound  given  a  good  lower-bound  estimator  for  z*. 
Dantzig  and  Infanger  [6]  and  Higle  and  Sen  [14]  use 
Monte  Carlo  versions  of  lower  bounds  obtained  in 
adaptations  of  deterministic  cutting-plane  algorithms. 
Higle  and  Sen  [17]  have  also  proposed  a  statistical 
lower  bound  that  is  rooted  in  duality.  These  bounds 
are  discussed  in  greater  detail  in  Section  2.2. 

In  this  paper,  we  show  that  Ez*  ^£'z*_|_[  ^z*.  Thus, 
z*  provides  a  probabilistic  lower  bound  on  z*  that  im¬ 
proves  (in  expectation)  with  increasing  sample  size. 
We  use  this  result,  in  a  batch-means  procedure,  to  pro¬ 
duce  confidence  intervals  on  the  optimality  gap  with 
respect  to  any  candidate  solution  x.  The  method  we 
propose,  like  the  method  of  [17],  has  the  advantage 
that  it  is  independent  of  specific  solution  procedures. 
Our  bound  estimator  is  also  unbiased,  i.e.,  for  any  n 
there  is  a  deterministic  constant  0„  ^z*  with  Ez*  =  0„. 
Consequently,  this  estimator  is  well  suited  for  con¬ 
structing  confidence  intervals. 

Before  proceeding,  we  note  that  there  are  two  other 
general  approaches  that  one  might  consider  using  for 
testing  or  ensuring  solution  quality  for  stochastic  pro¬ 
grams.  One  approach  [34]  tests  the  null  hypothesis 
“the  Karush-Kuhn-Tucker  (KKT)  conditions  are  sat¬ 
isfied”.  But,  when  a  stochastic  program  is  solved  by 
a  statistical  or  heuristic  procedure,  a  suboptimal  solu¬ 
tion  is  virtually  assured  and  this  approach  can  say  lit¬ 
tle  about  the  quality  of  such  solutions.  (Higle  and  Sen 
[15],  however,  derive  a  bound  on  the  optimality  gap 
that  is  motivated  by  the  KKT  conditions.)  Another  ap¬ 
proach  uses  the  limiting  distribution  of  \/n{x*  —  x*  ) 
to  construct  confidence  intervals  for  jc* .  Under  certain 
conditions,  asymptotic  normality  has  been  verified  for 
the  iterates  of  specific  stochastic  approximation  pro¬ 
cedures,  for  example,  the  Robbins-Monro  procedure; 
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see  the  survey  in  Pflug  ([27],  Section  5).  However, 
this  distribution  is  usually  non-normal  because  of  the 
constraints  x  GX  [8,  21,  32].  The  method  we  propose 
is  simple  and  requires  only  mild  assumptions:  We  as¬ 
sume  that  (i)  /(jc,  I)  has  finite  mean  and  variance, 
(ii)  i.i.d.  observations  of  ^  can  be  generated,  (iii)  in¬ 
stances  of  SP„  can  be  solved  for  sufficiently  large  n  to 
yield  “good”  bounding  information,  and  (iv)  / (x,  |) 
can  be  evaluated  exactly  for  specific  values  of  x  and 
realizations  of 

In  the  next  section,  we  review  standard  Monte 
Carlo  upper  (pessimistic)  bounds  and  develop  our 
new  Monte  Carlo  lower  (optimistic)  bounds.  In 
Section  3,  we  show  how  to  use  these  probabilistic 
bounds  to  obtain  approximate  confidence  intervals  on 
the  quality  of  a  candidate  solution;  we  do  this  in  a 
naive  fashion  but  also  using  the  variance-reduction 
technique  of  common  random  numbers.  Section  4 
provides  computational  results  on  four  two-stage 
stochastic  linear  programs  from  the  literature,  and 
Section  5  gives  a  brief  conclusion  and  mentions  areas 
for  further  research. 


2.  Monte  Carlo  bounds 

2.1.  Upper  hounds 


Here,  =4*  denotes  convergence  in  distribution,  and 
N(0,  fT^)  is  a  normal  random  variable  with  mean  zero 
and  variance  This  CLT,  coupled  with  s^{n),  the 
standard  sample  variance  estimator  of  enables 
the  construction  of  confidence  intervals  for  Ef(x,^). 
Unbiasedness  and  asymptotic  normality  can  also  be 
achieved  using  variance-reduction  techniques  such  as 
importance  sampling  [4,  19]. 

2.2.  Lower  bounds 


Here  we  state  our  main  lower-bounding  results  and 
discuss  related  lower  bounds  from  the  literature. 


Theorem  1.  Let  . . . ,  be  i.i.d.  from  the  distribu¬ 
tion  of  |.  Then, 


Ez*  =  E  min 
xex 


1 

n 


i=\ 


^  * 


Proof. 

1  " 

min  Ef{x,  ^)  =  min  V  f{x,  ?') 

x^X  x^X  Tl  *  ^ 

i=\ 

^i?min^^/(x,|‘).  □ 

xex  n 

1=1 


Suppose  that  we  have  used  a  procedure,  possibly 
heuristic,  to  find  a  “good”,  but  probably  suboptimal 
solution  X  G  X  for  a  stochastic  program  SP.  We  can 
estimate  E f(x,  |)  (the  expected  cost  of  operating  our 
“system”  with  a  suboptimal  decision  vector  x  =  x)  via 
the  standard  sample  mean  estimator 

U(n)=^-^fix,n 

/=i 

where  are  i.i.d.  from  the  distribution  of 

This  estimator  has  two  important  properties:  It  is  an 
unbiased  estimator  of  the  true  cost  of  a  suboptimal 
decision  x,  i.e., 

EU{n)  =  Efix,i)^z*,  (2) 

and  it  satisfies  the  following  central  limit  theorem 
(CLT): 

s/n[U(n)  —  Ef(x,^)]  ^  N(0,  (t„)  as  «  ^  oo 

where  al  =  var  f(x,  |).  (3  ) 


Just  as  Eqs.  (2)  and  (3)  lead  to  confidence  inter¬ 
vals  on  an  upper  bound  on  z*,  Theorem  1,  exploited 
in  a  batch-means  approach,  will  lead  to  confidence  in¬ 
tervals  on  a  lower  bound  for  z* .  Section  3  combines 
these  results  to  bound  the  optimality  gap.  Importantly, 
Theorem  1  requires  little  in  terms  of  the  structure  of 
SP:  E f{x,  I)  must  exist  Vx  e  X,  but  X  need  not  be 
convex  and  E f{x,  |)  need  not  be  convex,  imimodal, 
or  smooth.  So,  the  subproblem  defined  by  /  could, 
for  example,  contain  integer  decision  variables. 

We  have  stated  Theorem  1  using  i.i.d.  samples,  but 
the  conclusion  of  the  theorem  holds  for  any  unbiased 
estimator.  In  particular,  minjgjf  [l/«^J^j /(x,  |')] 
can  be  replaced  with  min^gj^  (x,  |' , . . . ,  |” )  pro¬ 
vided  £'J^(x, ^',...,^")  =  Ef(x,i).  Such  general¬ 
izations  are  useful  when  the  lower-bound  estimator 
is  constructed  using  variance-reduction  techniques  in 
which  the  observations  are  not  i.i.d.,  for  example, 
when  using  antithetics,  stratified  sampling,  etc.,  or 
when  /(x,  I)  is  effectively  altered  as  with  importance 
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sampling  and  some  control-variate  schemes.  (See,  for 
example,  Law  and  Kelton  [24]  for  a  discussion  of 
these  variance-reduction  techniques.)  Korf  and  Wets 
[23]  have  similarly  extended  consistency  results  for 
SP„  when  the  approximating  problem  is  constructed 
from  non-i.i.d.  but  stationary  sequences  of  observa¬ 
tions. 

We  note  that  when  «  =  1,  a  special  case  of  the  theo¬ 
rem  is:  z*  mirijc^x  f  ^)-  This  is  the  well-known 
“wait-and-see  bound”  of  Madansky  [25].  The  intu¬ 
ition  behind  the  wait-and-see  bound  and  the  bound  of 
Theorem  1  is  similar.  In  solving  the  original  problem 
SP,  we  must  find  a  decision  that  hedges  against  all  pos¬ 
sible  realizations  of  When  calculating  these  lower 
bounds,  we  optimize  over  a  subset  of  |’s  support.  Be¬ 
cause  of  this  “inside  information”,  we  over-optimize 
and,  on  average,  obtain  an  optimistic  objective  value. 
Based  on  this  same  intuition,  we  expect  that  the  value 
(and  hence  the  quality)  of  the  boimd  to  grow  as  n  in¬ 
creases.  Theorem  2  has  been  verified  independently 
by  Norkin  et  al.  [26]. 

Theorem  2.  Let  , . . . ,  be  i.  i. d.  from  the  dis¬ 

tribution  of  ^  and  be  used  to  define  z*  and  z*_|_[ .  Then, 
Fv*  >  Fz* 


Proof. 

I  «+l 

i=  1 


=  E  min 


Monotonicity  is  a  desirable  property  because  it  indi¬ 
cates  that,  on  average,  better  lower  bounds  and  thus 
tighter  confidence  intervals  will  be  obtained  as  sam¬ 
ple  sizes  increase.  The  property  is  not  necessary  for 
our  computational  procedures,  however. 

There  are  a  number  of  related  results  in  the  liter¬ 
ature.  Birge  [1]  introduces  a  class  of  deterministic 
boimds  for  two-stage  stochastic  LPs  that  is  based 
on  solving  all  possible  groups  of  n-scenario  prob¬ 
lems.  In  finance,  Broadie  and  Glasserman  [2]  de¬ 
velop  pessimistic  and  optimistic  estimators  for  the 
prices  of  American-style  securities;  the  optimistic 
bound  is  found  by  solving  a  stochastic  dynamic 
program  on  a  simulated  scenario  tree.  Norkin  et  al. 
[26]  use  pessimistic  and  optimistic  bounds  within 
a  branch-and-bound  algorithm  for  stochastic  global 
optimization. 

We  may  reformulate  the  approximating  problem  us¬ 
ing  explicit  “non-anticipativity”  constraints  x‘  =x  for 
all  i,  with  associated  Lagrange  multipliers  1‘,  and  per¬ 
form  a  Lagrangian  relaxation  of  these  constraints  to 
obtain 

z*=mm  -'^f{x\^‘) 

L  i=l 

s.t.  G  X,  /  = 

x‘  =x:A\ 

^  min  ^  V  fix’  X‘x‘ 

n 

L  i=l 

s.t.  x^GX,  (4) 


«+i  y 

n 

j 

via  conjugate  duality  in  [17].  The  boimd  is  weaker 
=  Ez*.  □  in  expectation  than  Ez*,  but  the  associated  opti¬ 

mization  problem  separates  by  scenario  which  is 
The  monotonicity  result  of  Theorem  2  can  also  be  computationally  advantageous.  Techniques  are  also 

verified  under  more  general  hypotheses.  For  example,  proposed  in  [17]  to  generate  multipliers  X‘  and  to 

if  we  use  one  stream  of  i.i.d.  random  vectors,  .  use  the  lower-bound  estimate  to  test  the  quality  of  a 

to  define  z*  and  another  stream  of  i.i.d.  vectors,  say,  candidate  solution  jc. 

I I  ,  to  define  z*^^,  these  streams  can  contain  By  taking  the  dual  of  the  second-stage  problem,  the 

common  random  variables  (as  in  the  current  statement  two-stage  stochastic  LP  (see  Eq.  ( 1 ))  can  be  rewritten 
of  this  theorem),  can  be  independent,  or  otherwise.  in  a  manner  which  suggests  the  application  of  Benders 


^  E  min 

«  -T  1  ^  xex 

i=l 


provided  X’  =  0.  The  Monte  Carlo  lower  bound 
on  the  right-hand  side  of  Eq.  (4)  was  introduced. 


fV.-K  Mak  et  al.  I  Operations  Research  Letters  24  (1999)  47-56 


51 


decomposition  [35]: 

6 

1 


=  mm 

cx 

xGX 

s.t. 

e- 

k  = 

-  V 
11  ^ ^ 


!=1 


1 


1=1 


T^’'d 


where  (tc' 


ti 


(5) 

7t*^’ " ),  k=\,...,K,  denote  the  extreme 


points  of  the  Cartesian  product  of  the  second-stage 
dual  feasible  regions,  i.e.,  {n:nD^g}";  the  corre¬ 
sponding  constraints  are  called  “cuts”.  Cutting-plane 
algorithms  [35]  generate  a  sequence  of  cuts  and  lower 
bounds  based  on  the  resulting  relaxations.  For  com¬ 
putational  efficiency,  Higle  and  Sen  [14]  generate  a 
weaker  version  of  these  bounds  in  their  stochastic 
decomposition  algorithm  by  considering  a  restricted 
set  of  dual  extreme  points.  Termination  criteria  for 
the  stochastic  decomposition  algorithm  are  discussed 
in  ([16],  Section  5)  [14,  15].  Dantzig  and  Glynn  [4] 
and  Dantzig  and  Infanger  [6]  consider  a  related  lower 
bound,  and  associated  termination  criteria,  using  an 
independent  set  of  observations  for  each  of  the  cuts 
instead  of  the  common  set  of  samples  that  is  used  in 
Eq.  (5)  and  in  [14]. 


3.  Confidence  interval  constrnction  and  variance 
rednction 


This  section  develops  two  approaches  for  con¬ 
structing  confidence  intervals  on  the  optimality  gap 
Ef(x,  I)  —  z*  with  respect  to  a  candidate  solution  x. 


3.1.  Independent  random  number  streams 


Let  I'', . . . ,  I'",  be  i.i.d.  batches  of  ran¬ 

dom  vectors.  Defining 

I  n  1 

=  min  -  ^  f(x,  and  Z(«/)  =  —  ^  z*‘, 
x&x  n  ^ '  nr  ^ ' 

7=1  ,=1 


we  have 


s/n/[L(n/)  —  Ez*]  ^  N(0,  cr/)  as  ^  oo 

where  a}  =  varz*.  (6) 

Note  that  the  elements  j=\,...,n,  within  a  batch 
need  not  be  i.i.d.  A  more  general  estimator  (F  can  be 
used  to  define  z*‘  because  inter-batch  independence 


is  sufficient  to  ensure  that  z*',  i  =  are  i.i.d. 

As  described  after  Theorem  1,  we  simply  require  an 
estimator  with  E(F{x,  , . . . ,  =  i? f{x,  i). 

Let  4-1,0!  satisfy  =  1  —  a,  where  the 

random  variable  T„  has  a  t  distribution  with  n  —  \ 
degrees  of  freedom.  Let  s^{nf)  denote  the  standard 
sample  variance  estimator  of  aj,  let  be  the  number 
of  observations  used  to  estimate  £'/(i,  |),  and  define 

^  — 1,C(  “^uCUu)  ,  ~  4^— 1,0! 

8u  = - — -  and  £/  = - — - . 

In  this  strategy  of  “independent  random  number 
streams”,  the  upper-bound  estimate  on  z*  is  computed 
using  a  stream  of  observations  that  satis¬ 

fies  Eqs.  (2)  and  (3)  and  which  is  independent  from 
the  stream  used  for  the  lower-boimd  estimate.  Then, 
for  sufficiently  large  «u  and  n^,  we  appeal  to  the  re¬ 
spective  CLTs  (3)  and  (6).  These,  coupled  with  the 
Boole-Bonferroni  inequality  (e.g.  [24],  Section  9.7), 
Theorem  1 ,  and  the  fact  that  x  is  suboptimal,  yield 

P{L(n,)  -  e,  ^Ez:  ^z*  ^Efix,  I)  ^  U(n^)  +  £„} 
^l-P{L(n,)-s,^Ez:} 

-P{U{n,)  +  E^>Ef(x,0} 

«  1  -  2a.  (7) 

From  Eq.  (7)  we  may  infer  that 

[0,  D(«u)  -  EM  +  e/+  4] 

is  an  approximate  (1  —  2a)-level  confidence  interval 
for  the  optimality  gap  at  x.  Due  to  sampling  error  we 
may  actually  observe  U{nu)  <  L{n/;)  and  hence  we 
recommend  the  more  conservative  confidence  interval 

[0,[(7(nu)-Z(«^)]+  +  e7  +  eu],  (8) 

where  [y]^  =  maxjj,  0}. 

Despite  the  fact  that  the  upper-  and  lower-bound  es¬ 
timators  use  independent  random  number  streams,  the 
events  {L{n()-ef  ^Ez* }  and  { U («u)+4  >Ef{x,^)} 
might  not  be  independent,  and  thus  the  Boole- 
Bonferroni  inequality  is  utilized  in  Eq.  (7).  Such  a 
situation  would  arise  if  jc  is  constructed  from  one  or 
more  of  the  optimal  solutions  to  the  lower-boimding 
problems,  for  example,  via  a  convex  combination  of 
the  lower-boimding  optimizers.  Such  averaging  pro¬ 
cedures  are  valid  provided  X  is  convex  and  have  also 
been  used  in  stochastic  quasi-gradient  algorithms  to 
accelerate  convergence  (e.g.  [27],  Section  5.1.3). 
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3.2.  Common  random  number  streams 


Instead  of  developing  a  confidence  interval  for  the 
optimality  gap  by  estimating  E f{x,  |)  and  Ez^  sepa¬ 
rately,  as  in  the  “naive  sampling  strategy”  of  the  pre¬ 
vious  section,  observe  that  by  Theorem  1 


n  x£X  n 


/=1 


^Ef{x,^)-z*. 


(9) 


As  a  result,  we  may  use  a  batch-means  approach  to 
estimate  EG„  in  which  the  same  set  of  observations  is 
used  in  the  upper-  and  lower-bound  estimators  on  the 
left-hand  side  of  Eq.  (9).  This  is  an  application  of  the 
common  random  numbers  (CRN)  variance-reduction 
technique  (e.g.  [24]). 

Note  that  ^  0  so  that  negative  gap  estimates  can¬ 
not  arise  as  they  could  with  the  naive  sampling  strat¬ 
egy;  variance  reduction  over  that  technique  will  be 
obtained  if  the  upper-  and  lower-bound  estimators  are 
positively  correlated.  The  correlation  should  be  high 
when  X  and  x*  are  “close”,  and  this  should  occur  if  x 
is  of  high  quality  and  the  batch  size  n  is  sufficiently 
large.  Higle  and  Sen  ([16],  Section  5.2.1)  have  also 
cited  the  merits  of  constmcting  gap  estimators  using 
CRNs. 

In  analogous  fashion  to  the  previous  section,  let 
I'', . . . ,  i  =  1, . . . ,  «g,  be  i.i.d.  batches.  We  use  each 
batch  to  define  an  observation  of  the  optimality  gap, 
G;,  and  let  G(«g)  =  n” '  Then, 

ydi^[G(«g)  —  EGn\  N(0,  Cg  )  as  Wg  ^  oo 

where  =  var  G„.  (10) 

Let  5g(«g)  denote  the  sample  variance  estimator  of 
and  define 


Then,  [0,  G(«g)  -I-  Sg]  is  an  approximate  (1  —  a)-level 
confidence  interval  for  the  optimality  gap  at  x. 


4.  Computational  results 

This  section  applies  the  proposed  techniques  to 
solve  the  four  test  problems  described  in  Table  1. 


The  first  problem,  DBl,  is  a  stochastic  vehicle-alloca¬ 
tion  model  in  a  single-commodity  network  from 
Donohue  and  Birge  [7].  They  use  this  problem,  with 
fixed  first-stage  variables,  to  evaluate  a  deterministic 
bound.  We  allow  first-stage  variables  x  to  position 
a  fleet  of  vehicles  subject  to  a  “resource  constraint” 
jc  G  X  =  {a:|  ^jXi  =  h,Xi^0yi}.  (For  ease  of  so¬ 
lution,  we  treat  first-stage  decisions  as  continuous  in 
all  four  problems.)  The  second  problem,  WRPMl  [5, 
19],  is  an  electric  power  system  capacity-expansion 
model  with  imcertain  demand  forecasts  and  generator 
reliability.  The  third  problem,  20TERM  [13],  models 
a  motor  freight  carrier’s  operations:  First-stage  vari¬ 
ables  position  a  fleet  of  vehicles  at  the  beginning  of 
a  day  while  second-stage  decisions  move  the  fleet 
through  a  multi-commodity  network  to  (i)  satisfy 
point-to-point  demands  for  shipments,  and  (ii)  end  the 
day  with  a  fleet  configuration  matching  the  first-stage 
decision.  Penalized  violations  of  second-stage  re¬ 
quirements  are  allowed.  The  final  problem,  SSN 
[31],  originates  in  the  telecommunications  industry: 
First-stage  variables  expand  capacity  in  a  commu¬ 
nications  network,  and  second-stage  variables  route 
demands  for  point-to-point  commimication  through 
the  network.  All  four  problems  have  finite  discrete 
distributions  for  their  random  parameters.  In  all  cases, 
the  vector  d  (see  Eq.  (1))  is  stochastic  and  other 
problem  coefficients  are  deterministic,  except  that 
WRPMl  also  includes  a  stochastic  transition  matrix 
B  to  model  unreliable  electrical  generators. 

Table  2  displays  computational  results  for  the 
naive  solution  strategy  based  on  independent  ran¬ 
dom  number  streams  (see  Section  3.1).  In  calculating 
the  lower  bound,  =  30  independent  batches  are 
used  for  each  problem.  Each  batch  consists  of  «  =  25 
i.i.d.  observations  of  the  random  parameters,  except 
that  SSN  requires  a  significantly  larger  batch  size. 
The  upper-bounding  estimator  is  formed  using  i.i.d. 
observations  that  are  independent  of  those  used  in 
lower-bound  estimation.  The  upper  boimd  is  estimated 
with  respect  to  a  candidate  solution  x  =  nj^  Y17=\  K’’ 
where  the  x*‘  are  optimal  solutions  to  the  n/  respec¬ 
tive  approximating  problems.  The  sample  size  used 
for  the  upper-bounding  estimator  is  selected  to  yield 
an  error  estimate  8u  of  approximately  the  same  size 
as  except  for  WRPMl  where  the  computational 
effort  would  be  too  great;  for  this  problem  we  roughly 
balance  the  times  spent  computing  lower  and  upper 
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Table  1 

Test  problem  descriptions,  “rows”,  “cols.”  and  “nonzeros”  are,  respectively,  the  numbers  of  constraints, 
variables  and  nonzero  constraint  entries  n  the  associated  problems.  B  is  actually  deterministic  in  three  of  the 
problems 


Problem  1  st  stage 

2nd  stage 

Nonzeros 

in  Bx 

Dimension  Total 
of  ^  scenarios 

Rows  Cols. 

Nonzeros 

Rows  Cols. 

Nonzeros 

DBl  1  5 

5 

71  102 

255  5 

46 

4.5  X  10^5 

WRPMl  43  75 

107 

301  289 

865  42 

11 

5.6  X  10« 

20TERM  3  63 

63 

124  764 

4404  63 

40 

1.1  X  10*^ 

SSN  1  89 

89 

175  706 

2284  89 

86 

1.0  X  10™ 

Table  2 

Test  results  for  sampling  with  independent  random  number  streams 

Problem 

DBl 

WRPMl 

20TERM 

SSN 

Lower  bound 

Batch  size:  n 

25 

25 

25 

1000 

No.  of  batches: 

30 

30 

30 

30 

Point  estimate:  L(n^) 

-17  548 

288  865 

253446 

9.22 

Error  estimate  (a  —  0.975): 

189 

269 

885 

0.22 

CPU  minutes  (a) 

1.1 

8.9 

34.5 

2443 

Upper  bound 

Sample  size:  «u 

1000 

50  000 

20000 

100000 

Point  estimate:  U{nu) 

-17  661 

289204 

254425 

9.98 

Error  estimate  (a  —  0.975): 

fu 

139 

722 

752 

0.11 

CPU  minutes  (b) 

0.1 

10.1 

19.3 

120 

Optimality  gap 

Point  estimate:  [U(«u)  —  L(n^)]^ 

0 

338 

979 

0.76 

Confidence  interval  (95%) 

[0,328] 

[0,1329] 

[0,2616] 

[0,1.09] 

CPU  minutes  (a+b) 

1.2 

19.0 

53.8 

2563 

bounds.  As  described  in  Section  3.1,  negative  gap 
estimates  are  possible  when  the  positive-part  operator 
is  not  applied;  note  that  U(ria)  <  L(n/-)  for  DBl. 

Table  3  displays  the  computational  results  for  sam¬ 
pling  based  on  CRN  (see  Section  3.2).  Unlike  the 
naive  strategy,  a  candidate  solution  x  cannot  he  de¬ 
rived  from  the  approximating  problems  used  to  esti¬ 
mate  the  optimality  gap.  Therefore,  we  compute  jc  for 
the  CRN  strategy  by  solving  an  initial  approximating 
problem  that  has  twice  as  many  scenarios  as  used  for 
lower-bound  estimation.  The  CPU  times  reported  for 
gap  estimation  in  Table  3  are  slightly  longer  than  the 
lower-bound  times  in  Table  2  because  they  include 
(i)  the  time  to  solve  the  30  approximating  problems 


used  for  lower-bounding,  (ii)  the  solution  time  for  the 
initial  approximating  problem  used  to  generate  x,  and 
(iii)  the  time  for  calculating  the  upper  boimd  terms 
in  G„  (see  Eq.  (9)).  The  upper-bound  estimates  re¬ 
ported  in  Table  3  are  auxiliary  calculations  that  are 
not  necessary  to  generate  the  confidence  intervals  on 
optimality  gaps,  but  they  do  allow  for  a  comparison  of 
the  two  methods  of  generating  candidate  solutions  x. 
The  relevant  CPU  times  to  compare  for  the  two  strate¬ 
gies,  for  the  purpose  of  determining  solution  quality, 
are  the  “optimality  gap”  times  from  the  respective 
tables. 

On  all  problems  except  SSN,  very  tight  confidence 
intervals  on  the  optimality  gap  are  obtained  with 
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Table  3 

Test  results  for  sampling  with  common  random  numbers 


Problem 

DBl 

WRPMl 

20TERM 

SSN 

Sample  Size  n  in  SPn 

used  to  generate  x 

50 

50 

50 

2000 

Optimality  gap 

Batch  size 

25 

25 

25 

1000 

No.  of  batches:  «g 

30 

30 

30 

30 

Point  estimate:  G{n^) 

23 

197 

141 

0.69 

Error  estimate  (a  =  0.95):  eg 

5.1 

46.3 

45.8 

0.080 

Confidence  interval  (95%) 

[0,28] 

[0,243] 

[0,187] 

[0,0.77] 

Variance  reduction 

4100 

460 

1300 

17 

CPU  (minutes) 

1.2 

9.5 

37.1 

2616 

Upper  bound 

Sample  size: 

1000 

50000 

20000 

100000 

Confidence  interval  (95%) 

-17567  ±  114 

289186  ±  605 

254394  ±  635 

10.06  ±  0.12 

CPU  minutes 

0.1 

9.9 

19.7 

119 

modest  computational  effort.  The  CRN  strat¬ 
egy  yields  confidence  interval  widths  that  are 
0.2%,  0.08%,  0.07%,  and  8%  of  the  upper-boimd  esti¬ 
mates  for  DBl,  WRPMl,  20TERM,  and  SSN,  respec¬ 
tively.  The  CRN  sampling  strategy  yields  significant 
computational  savings  as  reflected  in  the  “variance  re¬ 
duction”  values  calculated  as  ((e/-|-£u)/eg)^  and  listed 
in  Table  3.  When  «  Sy,  this  quantity  gives  the  ap¬ 
proximate  multiplicative  factor  by  which  sample  sizes 
for  the  naive  strategy  must  be  increased  to  achieve 
the  confidence  interval  width  of  the  CRN  strategy. 

In  applying  either  sampling  strategy,  we  must  solve 
a  set  of  two-stage  stochastic  LPs.  To  do  so,  we  use 
the  regularized  decomposition  (RD)  algorithm  [29] 
which  is  a  cutting-plane  algorithm  whose  master  pro¬ 
gram  contains  a  quadratic  proximal  term.  RD  tends 
to  converge  more  quickly  than  standard  cutting-plane 
methods  and  can  better  exploit  good  starting  solutions 
to  speed  convergence  [30].  In  solving  DBl,  WRPMl, 
and  20TERM,  we  apply  our  implementation  of  RD, 
which  uses  IBM’s  Optimization  Subroutine  Library 
[18]  to  solve  LP  subproblems  and  LSSOL  [12]  to 
solve  the  quadratic  master  program.  Our  code  cannot 
solve  SSN  in  a  reasonable  amount  of  time,  however, 
so  for  this  problem  we  use  the  RD  implementation 
of  Ruszczyhski  and  Swietanowski  [30].  (As  explained 
below,  our  code  does  solve  the  first  three  problems 
faster  than  the  Ruszczyhski  and  Swietanowski  code.) 


In  solving  the  sequence  of  approximating  problems 
with  our  RD  algorithm,  we  accelerate  solutions  by  us¬ 
ing  the  average  of  the  optimal  solutions  from  previous 
problems  as  the  starting  proximal  point  for  the  next 
problem.  For  comparison,  we  also  solved  each  of  the 
30  approximating  problems  without  this  enhancement: 
Running  times  increase  by  25%,  100%,  and  330%  for 
DBl,  WRPMl,  and  20TERM,  respectively.  It  is  imfair 
to  compare  running  times  of  our  code  to  the  times  for 
the  Ruszczyhski  and  Swietanowski  code  because  their 
code  uses  more  stringent  stopping  tolerances  (10~^ 
versus  10^"^),  and  it  was  not  possible  to  apply  the 
“starting  point  enhancement”  in  their  code.  However, 
the  “lower-boimd”  times  reported  in  Table  2  for  DBl, 
WRPMl,  and  20TERM  are  faster  than  their  code  by 
factors  of  2.2, 9,  and  1 .5,  respectively.  Thus,  it  is  likely 
that  the  solution  time  for  SSN  can  be  substantially  im¬ 
proved. 


5.  Conclusions  and  extensions 

We  have  shown  that  the  solution  value  to  a  standard 
approximating  problem  (SP„)  for  a  two-stage  stochas¬ 
tic  program  (SP)  yields  a  lower  bound,  in  expectation, 
on  the  solution  value  of  SP.  This  result  has  been  ex¬ 
ploited,  in  a  batch-means  approach,  to  develop  con¬ 
fidence  intervals  on  the  optimality  gap  with  respect 
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to  any  candidate  solution  to  SP.  Computational  effi¬ 
ciency  is  improved  by  using  common  random  number 
(CRN)  streams  for  gap  estimation  and  by  using  regu¬ 
larized  decomposition. 

Because  the  lower-bounding  result  is  so  simple 
and  so  general,  confidence  intervals  may  be  ob¬ 
tained  for  two-stage  stochastic  programs  with  general 
structure,  e.g.,  with  integer  first-  or  second-stage  vari¬ 
ables,  with  randomness  in  any  of  the  second-stage 
parameters,  etc.  Although  not  developed  in  this  pa¬ 
per,  the  lower-bounding  result  extends  to  multi-stage 
stochastic  programs  through  a  straightforward,  re¬ 
cursive  application  of  the  proof  of  Theorem  1 .  The 
intuition  regarding  the  multi-stage  result  is  similar: 
If  we  optimize  over  a  subset  of  the  possible  futures 
at  each  branch  in  the  scenario  tree,  we  will  obtain 
an  optimistic  result,  on  average.  Extending  the  tech¬ 
niques  described  here  to  the  multi-stage  problem  is 
an  important  area  for  future  research. 

The  computational  efficiency  of  our  proposed  meth¬ 
ods  can  certainly  be  improved.  We  have  examined 
variance  reduction  using  CRNs,  but  other  techniques 
should  be  explored.  Regularized  decomposition  allows 
us  to  exploit  good  starting  solutions  to  solve  m  in¬ 
stances  of  SP„  faster  than  m  times  the  time  required  to 
solve  a  single  instance,  but  other  techniques  may  also 
speed  solutions.  For  instance,  tight  cuts  from  a  solu¬ 
tion  of  one  instance  of  SP„  might  be  used  (temporarily) 
to  help  solve  another  instance.  Also,  m  approximat¬ 
ing  problems  can  obviously  be  solved  on  m  parallel 
processors  with  a  near-linear  improvement  in  speed. 

Under  certain  assumptions,  most  notably  that  SP 
have  a  unique  optimal  solution,  —  z*)  is 

asymptotically  normal  with  mean  zero  and  variance 
var/(jc*,|)  [33].  (There  are  also  related  results  con¬ 
cerning  the  stochastic  order  of  convergence  [28], 
Section  6).  Under  this  hypothesis,  our  batch-means 
approach  can  be  simplified:  Confidence  intervals  on 
solution  quality  can  be  constructed  by  solving  a  single 
approximating  problem.  Infanger’s  [20]  approach  to 
constructing  such  confidence  intervals  in  the  context 
of  a  cuhing-plane  algorithm  shows  promise. 
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